The invention is related to a method for controlling the operation of a telephone, especially a mobile telephone used in a cellular network. More particularly, the invention is related to a method for performing automatic check and correction in the set-up stage of a telephone connection. The example telephone apparatus to which the invention is applied will hereinafter be called a mobile phone, by which it is meant telephone apparatuses (e.g. a hand phone) in a cellular network. The invention as such is in no way limited to mobile phones but can as well be applied to telephones of the wire network.
The method according to the invention is related to the implementation of a user interface of a telephone apparatus, and hereinafter a user interface applying the method according to the invention is called the user interface according to the invention. It can be advantageously applied to telephone voice control, which below will be used as an example of implementation of the invention. However, the invention is not limited to voice-controlled user interfaces but it can also be used in user interfaces based on push-button commands.
Telephone voice control as such is not a new invention. When used in a car, a mobile phone often has to be capable of operating in the hands-free mode, for which there has to be in the car hands-free facilities including a separate microphone and a loudspeaker. A driver who speaks on the phone can use his or her both hands for driving the car during the call. The advantages of the hands-free operation are convenience and added safety. Because of the convenience of use the hands-free facility is also used as desktop hands-free installation in the office environment.
The practicability of the hands-free facility is affected by the fact that, when making a call, the number usually has to be dialled using the keypad user interface of the telephone. The same is true for answering the phone. A voice-controlled telephone eliminates this problem because the keypad user interface of the phone is not needed for making or answering a call.
In the prior art it is known several different ways of implementing a voice-controlled telephone user interface. These methods are disclosed e.g. in U.S. Pat. Nos. 5,222,121 and 4,928,302. Below it will be discussed prior art voice-controlled user interfaces mainly in general and some details of particular arrangements.
Two commonly used concepts related to the voice-controlled telephone user interface are digit dialling and repertory dialling. In repertory dialling, the user selects a phone number on the basis of a pre-recorded voice recording. The voice recording corresponds to a name associated with the phone number, whereby it is possible to select a number on the basis of the name of the owner of the number. The voice command can comprise one or more words, e.g. "John" or "John Smith".
Before a repertory dialling command the telephone has to be set into a mode where it knows to expect a name. This can be achieved either with a voice command or using the keypad of the phone. In a car installation, it is also possible to have an additional external control facility e.g. in the vicinity of the steering wheel, thus making it easy to activate the repertory dialling state.
An essential feature in the repertory dialling user interface is the training phase in which the user stores the names associated with the numbers as voice messages, or frequency and time coded signals, in the telephone's memory. Depending on the implementation of the user interface the user may have to repeat a name more than once to make a reliable recording for speech recognition. In the recognition phase, the phone compares the spoken name command to all the recordings and, on the basis of a statistical comparison, selects the voice recording that best matches the command.
Since there may always occur an error in the recognition, the phone usually verifies the recognized name in some way. Usually this is done by reproducing the recognized voice recording and requiring user verification. As the phone has reproduced the voice recording that it has found on the basis of the recognition, it expects the user to give an affirmative or negative answer. If the recognition was correct, the user says e.g. "yes", whereby the phone begins to set up the connection. If, on the other hand, the recognition was incorrect, the user says e.g. "no", whereby a prior art telephone usually returns to the initial state of the repertory dialling. To this method of operation it is known an improvement disclosed in U.S. Pat. No. 4,928,302 in which the user does not have to verify a correct recognition with an affirmative answer like "yes", since the phone, having reproduced the recording that it has found as a result of the recognition, either immediately or after a short delay starts to set up the connection. If the recognition was incorrect, the user may cancel the call during said delay or even during call set-up. In addition, it is known from U.S. Pat. No. 5,222,121 an improvement to the method discussed in which the telephone selects in the recognition several recognition results and reproduces first the result that best matches the recognition. If the user gives a negative answer to this reproduction, the phone selects the result that is the second best match to the recognition. In the primary claim of U.S. Pat. No. 5,222,121 this function is generalized so that as a response to each repetition of a particular voice command the telephone indicates as voice reproduction and/or on the display the next best candidate, when the candidates have been arranged in order.
In digit dialling the user selects a phone number using a voice command comprising a series of digits. In other words, the number is spoken to the phone, which recognizes the series of digits and sets up a connection to the telephone number it recognized. Before uttering the phone number, the phone has to be set into a mode where it knows to expect a number. This can be achieved either with a voice command or using the push buttons on the telephone's keypad. In a car installation, it is also possible to have an additional external control facility e.g. in the vicinity of the steering wheel, thus making it easy to activate the digit dialling state.
Since not all telephone numbers are equally long, the user has to end the series of digits with a command word (e.g. "dial") to inform the telephone that the number contains no more digits. In principle, the telephone could conclude it from the silence that follows the uttering of digits, but such a method results in delay and uncertainty, especially in a noisy environment. Indeed, in the prior art it is used exclusively methods with a command word ending the number. So, a digit dialling command could be e.g. "nine three one two two three two three four three dial".
Voice-controlled phones using digit dialling differ from each other significantly. In the most widely used method the digits have to be uttered separately with a short pause between the individual digits. Such a recognition method is called isolated word recognition. Another method is to utter the whole sequence of digits without pauses; such a method is called connected word recognition. For the speech recognition unit of a phone the recognition of individual digits is much easier than of whole digit sequences in which the transition points between individual digits are unknown. For the user, however, uttering connected digits is the more natural way of selecting phone numbers.
In prior art user interfaces based on digit dialling the most important factor affecting the correct recognition of a phone number is the probability of a correct recognition by the phone of an individual digit. The effect of this probability can be illustrated by the following example: Let us assume that a user dictates a completely random nine-digit telephone number and let us further assume that the speech recognition unit in the phone operates purely on a guessing basis, whereby the probability of recognizing an individual digit correctly is 0.1. The probability for that the phone recognizes the whole nine-digit sequence correctly is (0.1).sup.9 =10.sup.-9, or one in a billion. Even if the speech recognition unit were improved so that the probability of a correct recognition of an individual digit would be 0.8, or 80%, the probability of a correct recognition of the whole nine-digit sequence would still be a modest 13% (0.8.sup.9 .apprxeq.0.134).
From the user's point of view, it is frustrating if a voice-controlled phone recognizes correctly every seventh phone number on the average. A similar problem may arise in connection with a user interface based on push-button commands, especially with forgetful or clumsy users: although the phone recognizes the push-button commands with a 100% probability, the user may misremember the phone number or the corresponding alphanumeric character sequence or he or she may push a wrong button.