The present invention relates to a method for recognising speech and a device that utilises the speech recognition method according to the invention.
Normally, in mobile telephones, it is possible to browse through a telephone notepad to select a name by making use of the first letter of the name searched for. In this case, when a user during the search presses, e.g. the letter xe2x80x9csxe2x80x9d, the names beginning with the letter xe2x80x9csxe2x80x9d are retrieved from a memory. Thus, the user can more quickly find the name he/she is looking for without needing to browse through the content of the notepad in alphabetical order in order to find the name. This kind of method is fully manual and is based on the commands given by the user through a keyboard and the browsing of a memory based on this.
Today, there are also some mobile stations that utilise speech recognition devices, wherein a user can give a command by voice. In these devices, the speech recognition device is often speaker-dependent; i.e. the operation of the speech recognition device is based on that the user teaches the speech recognition device words that the speech recognition device is supposed to later recognise. There are also so-called speaker-independent speech recognition devices for which no separate training phase is required. In this case, the operation of the speech recognition device is based on a large amount of teaching material compiled from a large sampling of different types of speakers. Moderate operation in case of a so-called average user is typical of a speaker-independent recognition device. Correspondingly, a speaker-dependent speech recognition device operates best for the person who has trained the speech recognition device.
It is typical of both speech recognition devices mentioned above that the performance of the speech recognition device greatly depends on how large a vocabulary is used. It is also typical of speech recognition devices according to prior art that they are limited to a specific number of words, which the speech recognition device is capable of recognising. For example, in mobile stations, a user is provided with a maximum of 20 names, which he/she can store in a notepad within the telephone by voice and, correspondingly, use these stored names in connection with voice selection. It is quite obvious that such a number is not sufficient in present or future applications, where the objective is to substantially increase the number of words to be recognised. As the number of words to be recognised increases, e.g. ten-fold, with current methods, it is not possible to maintain the same speech recognition capacity as when using a smaller vocabulary. Another limiting factor, e.g. in terminal equipment, is the need for a memory to be used, which naturally increases as the vocabulary of the speech recognition device expands.
In current speech recognition devices according to prior art, the activation of a speech recognition device can be implemented by voice using a specific activation command, such as e.g. xe2x80x9cACTIVATExe2x80x9d, whereupon the speech recognition device is activated and is ready to receive commands from a user. A speech recognition device can also be activated with a separate key. It is typical of speech recognition devices activated by voice that the performance of the activation is dependent on the noise level of the surroundings. Also during the operation of the speech recognition device, the noise level of the surroundings greatly affects the performance of the speech recognition device to be achieved. It can be said that critical parameters for the performance of a speech recognition device are the extent of the vocabulary and the noise conditions of the surroundings.
A further known speech recognition system is disclosed in U.S. Pat. No. 4,866,778 where a user can select a sub-vocabulary of words by selecting an initial string of one or more letters causing the recognition to be performed against the sub-vocabulary restricted to words starting with those initial letters.
Now, we have invented a method and a device for recognising speech the objective of which is to avoid or, at least, to mitigate the above-mentioned problems of prior art. The present invention relates to a device and a method, wherein a user is allowed to give, during speech recognition, a qualifier by means of which speech recognition is only limited to those speech models that correspond with the qualifier provided by the user. In this case, only a specific sub-set to be used during speech recognition is selected from the prestored speech models.
According to an embodiment of the invention, a speech recognition device is activated at the same time as a qualifier that limits speech recognition is provided by touching the device making use of the existing keyboard or touch-sensitive screen/base of the device. The activation is most preferably implemented with a key. A method according to the invention provides a user with a logical way to activate the speech recognition device therein at the same time providing an improved performance of the speech recognition device along with the entered qualifier. The limitation of speech recognition according to the invention can also be implemented apart from the activation of the speech recognition device.
According to an exemplary embodiment of the invention, the device comprises a touch-sensitive screen or surface (base), whereupon the information about the character or several characters written on the screen is transmitted to the speech recognition device, in which case speech recognition is limited to words wherein the characters in question occur. Speech recognition is most preferably limited to a name beginning with the character written by the user on the touch screen.
According to an exemplary embodiment of the invention, speech recognition can also be implemented by making use in advance of all the stored models and by utilising the limiting qualifier provided by the user when defining the final recognition result.
According to a first aspect of the invention there is provided a method for recognising an utterance of a user with a device, wherein a set of models of the utterances have been stored in advance and for speech recognition, the utterance of the user is received, the utterance of the user is compared with the prestored models and, on the basis of the comparison, a recognition decision is made, the method being characterised in that,
the user is allowed to provide a qualifier limiting the comparison by touching the device, the qualifier identifying an item in a menu structure of the device,
a sub-set of models is selected from the stored models on the basis of the qualifier provided by the user said sub-set of models identifying sub-items of the menu structure, and
a comparison is made for making the recognition decision by comparing the utterance of the user with said sub-set of models.
According to a second aspect of the invention there is provided a method for recognising an utterance of a user with a device, wherein a set of models of the utterances have been stored in advance and for speech recognition, the utterance of the user is received, the utterance of the user is compared with the prestored models and, on the basis of the comparison, a recognition decision is made, the method being characterised in that,
a comparison is made for making a first recognition decision by comparing the utterance of the user with the prestored models,
the user is allowed to provide a qualifier limiting the comparison by touching the device for selecting a sub-set of models, the qualifier identifying an item in a menu structure of the device and said sub-set of models identifies sub-items of the menu structure,
a final comparison is made for making the recognition decision by comparing the first recognition decision with said sub-set of models.
According to a third aspect of the invention there is provided a device comprising a speech recognition device for recognising the utterance of a user, memory means for storing speech models, and means for receiving the utterance of the user, comparison means for carrying out the recognition process by comparing the utterance of the user with the models stored in the memory means, the device being characterised in that the device also comprises means for receiving a qualifier from the user by touching the device, means for selecting a set from the stored models on the basis of the qualifier received from the user for limiting the comparison made by the comparison means to said set of models and means for storing a menu structure of a device and for identifying the received qualifier as an item in a menu structure of the device.