1. Field of the Invention
The present invention relates to a technique for improved speech recognition in a telephone, particularly a mobile telephone, such as for automatic hands-free dialing.
2. Description of the Related Art
In recent years, telephones have become equipped with optional speech recognition circuitry to enable special hands-free functions to be carried out, such as automatic hands-free dialing. In the mobile phone environment, hands-free dialing by speech recognition is particularly useful to enable users to place calls while driving by reciting a name or number of a party to be called (called party). The mobile phone is equipped with a speech recognition circuit to convert the user's speech into audio feature data. Typically, the feature data is compared to different sets of pre-stored feature data corresponding to names previously recorded by the user during a registration process. If a match is found, the number corresponding to the name is automatically dialed.
According to a conventional speech recognition method applied to a Code Division Multiple Access (CDMA) mobile phone or the like, a match between the user's current speech and a pre-recorded called party name is established by comparing the current feature data (corresponding to the current speech) with each set of pre-stored feature data to determine the most similar data set. If the difference between the most similar data set and the current feature data is below a predetermined threshold, then the most similar data set is determined to match the current speech. Once a match is established, the telephone number of the called party corresponding to the most similar data set may be automatically dialed. On the other hand, if the difference is above the threshold, a matching condition will not be established. Note that a match will be made between a wrong called party if the wrong called party's feature data happens to be closest to the current feature data, with differences below the threshold. Another problem may occur when more than one recorded feature data set is highly similar to current feature data, with differences between each highly similar set and the current data less than the threshold. In this case, the user may be prompted to repeat the utterance or perform some other task to identify which called party name is intended.
The above approach of utilizing a fixed threshold (or thresholds) for determining whether an input utterance matches a pre-recorded name, ignores the fact that varying environmental conditions such as inherent features of pronounced vocal data, personal differences in pronunciation, etc., may be present at any given time. Consequently, a false recognition or a recognition error may be caused, resulting in an undesired party being called or excessive non-recognition of utterances.
One example of a prior art technique designed to increase the success rate of hands-free dialing using speech recognition is presented in U.S. Pat. No. 5,640,485. In this patent, when an utterance is determined to be outside a predetermined closeness threshold to all pre-recorded words, then the user is prompted to repeat the utterance, and a new closeness threshold is computed based on the pair of utterances. While this technique may have some benefit in improving dialing success rates, the repetition requirement is an inconvenience to the user.