Methods of information retrieval and electronic device control based on an utterance of a word, a phrase, or the making of other unique sounds by a user have been available for a number of years. In handheld telephones and other handheld electronic devices, an ability to retrieve a stored information, such as a telephone number, a contact info, etc., using words, phrases, or other unique sounds (hereafter generically referred to as utterances) is very desirable in certain circumstances, such as while the user is walking or driving. As a result of the increase in computing power of handheld devices over the last several years, various methods have been developed and incorporated into handheld telephones to use an utterance to provide the retrieval of stored information.
One class of techniques for retrieving phone numbers that has been developed is a class of retrieval that uses voice tag technology. One well known speaker dependent voice tag retrieval technique that uses dynamic time warping (DTW) has been successfully implemented in a network server due to its large storage requirement. In this technique, a set of a user's reference utterances are stored, each reference utterance being stored as a series of spectral values in association with a different stored telephone number. These reference utterances are known as voice tags. When an utterance is thereafter received by the network server that is identified to the network server as being intended for the retrieval of a stored telephone number (this utterance is hereafter called a retrieval utterance), the retrieval utterance is also rendered into a series of spectral values and compared to the set of voice tags using the DTW technique, and the voice tag that compares most closely to the retrieval utterance determines which stored telephone number may be retrieved. This method is called a speaker dependent method because the voice tags are rendered by one user. This method has proven useful, but limits the number of voice tags that can be stored due to the size of each series of spectral values that represents a voice tag. The reliability of this technique has been acceptable to some users, but higher reliability would be more desirable.
Another well known speaker dependent voice tag retrieval technique also stores voice tags in association with telephone numbers, but the stored voice tags are more compactly stored in a form of Hidden Markov Model (HMM). Since this technique requires significantly less storage space, it has been successfully implemented in a handhold device, such as mobile telephone. Retrieval utterances are compared to a hidden Markov model (HMM) of the feature vectors of the voice tags. This technique generally requires more computing power, since the HMM model is generated within the handheld telephone (generating the user dependent HMM in the fixed network would typically require too much data transfer).
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.