The present method relates to a method in speech recognition a speech recognition device and a speech-controlled wireless communication device.
For facilitating the use of wireless communication devices, speech recognition devices have been developed, whereby a user can utter speech commands which the speech recognition device attempts to recognize and convert to a function corresponding to the speech command, e.g. a command to select a telephone number. A problem in the implementation of speech control has been for example the fact that different users say the speech commands in different ways: the speech rate can be different between different users, so does the speech volume, voice tone, etc. Furthermore, speech recognition is disturbed by a possible background noise, whose interference outdoors and in a car can be significant. Background noise makes it difficult to recognize words and to distinguish between different words e.g. upon uttering a telephone number.
Some speech recognition devices apply a recognition method based on a fixed time window. Thus, the user has a predetermined time within which s/he must utter the desired command word. After the expiry of the time window, the speech recognition device attempts to find out which word/command was uttered by the user. However, such a method based on a fixed time window has e.g. the disadvantage that all the words to be uttered are not equally long; for example, in names, the given name is often clearly shorter than the family name. Thus, after a shorter word, more time will be consumed for the recognition than in the recognition of a longer word. This is inconvenient for the user. Furthermore, the time window must be set according to slower speakers so that recognition will not be started until the whole word is uttered. When words are uttered faster, a delay between the uttering and the recognition increases the inconvenient feeling.
Another known speech recognition method is based on patterns formed of speech signals and their comparison. Patterns formed of command words are stored beforehand, or the user may have taught desired words which have been formed into patterns and stored. The speech recognition device compares the stored patterns with feature vectors formed of sounds uttered by the user during the utterance and calculates the probability for the different words (command words) in the vocabulary of the speech recognition device. When the probability for a command word exceeds a predetermined value, the speech recognition device selects this command word as the recognition result. Thus, incorrect recognition results may occur particularly in the case of words in which the beginning resembles phonetically another word in the vocabulary. For example, the user has taught the speech recognition device the words “Mari” and “Marika”. When the user is saying the word “Marika”, the speech recognition device may make “Mari” as the recognition decision, even though the user may not yet have had time to articulate the end of the word. Such speech recognition devices typically use the so-called Hidden Markov Model (HMM) speech recognition method.
U.S. Pat. No. 4,870,686 presents a speech recognition method and a speech recognition device, in which the determination of the end of words by the user is based on silence; in other words, the speech recognition device examines if there is a perceivable audio signal or not. A problem in this solution is the fact that a too loud background noise may prevent the detection of pauses, wherein the speech recognition is not successful.