The present invention relates to a method for creating a reference model list for a voice recognition system in a voice-controlled communication appliance.
The present invention equally relates to an apparatus for creating reference models for voice recognition systems, having a conversion part for performing word/model conversion on command words in text form to produce reference models for use in voice recognition systems.
The present invention also relates to a voice-controlled communication appliance which has a voice recognition system having a memory for a reference model list.
In this case, a communication appliance is understood to be an appliance which is intended to be used for transmitting or processing speech and/or text, such as a terminal in a telecommunication network or a word processing system. In this regard, use needs to be distinguished from control of the communication appliance. A voice-controlled communication appliance is, therefore, a communication appliance which receives commands in spoken form, processes them and performs appropriate operations. The development of voice recognition systems for processing and recognizing spoken language and, as an application of voice recognition systems, of voice-controlled communication appliances is currently a significant area of technical development.
The known voice recognition systems, used in voice-controlled communication appliances, can be divided up into two groups; namely, speaker-independent systems having a prescribed, fixed vocabulary, in the first instance, and speaker-dependent, configurable systems in the second instance. A drawback of the former, speaker-independent systems is that they cannot be configured on an individual basis; once a particular word, e.g. in the form of a command for a particular control function of the communication appliance, has been defined, this word cannot be changed again later.
On the other hand, individually configurable systems of known type have the drawback that use by various users, i.e. voice recognition for a number of speakers, is not possible or is possible only with severe forfeiture in terms of the performance of the voice recognition. Another drawback of the configurable systems is that they need to be trained. Since the training involves recording and processing voice samples, of course, this often requires a great deal of effort and, furthermore, is time-consuming; particularly with regard to the requirements from the surroundings, for example, in terms of background noise.
A known approach to overcoming the speaker-dependency of configurable voice recognition systems uses “user recognition”. With this solution, the user needs to identify oneself to the voice recognition system using a dedicated password, and only on the basis of this identification is the system able to recognize the words entered by this user. Another known option is for the individual words to be practiced by various users and for the voice recognition system to generate a shared model for the word spoken by the users. Neither solution can dispense with training, however, and they therefore suffer from the aforementioned drawbacks of training. In addition, the ability to use them remains limited to the users involved.
Other known communication appliances use a hybrid form of speaker-independent and speaker-dependent voice recognition.
In this case, a permanently prescribed basic vocabulary is used to provide a speaker-independent vocabulary, and speaker-dependent recognition can be used to configure an individual supplementary vocabulary. However, even with this solution, recognition of the supplementary vocabulary is speaker-dependent and training is necessary as before.
DE 35 19 915 A1 discloses a method for voice recognition on terminals in telecommunication systems, in which the terminal contains a speech buffer which additionally holds voice signals supplied to a voice recognition section and forward them to a central voice recognition device which is arranged in the telecommunication system and has an increased storage and computation capacity if a voice recognition device held in the terminal cannot recognize a voice input clearly and associate it with a prescribed reference pattern.
It is an object of the present invention, therefore, to illustrate a way of configuring speaker-independent voice recognition for a communication appliance on an individual basis.