Not Applicable
Not Applicable
1. Technical Field
This invention relates in general to telecommunications and, more particularly, to voice activated dialing.
2. Description of the Related Art
Voice activated dialing has become a popular feature in many telephone systems. With voice activated dialing, a user can call a destination number without the need to look up the number or press any keys.
Typically, a caller can initiate a connection through a simple command such as xe2x80x9ccall the bossxe2x80x9d or xe2x80x9ccall home.xe2x80x9d In this example, xe2x80x9ccallxe2x80x9d is a command recognized by the voice activated dialing system (typically through speaker independent voice recognition) and xe2x80x9cthe bossxe2x80x9d or xe2x80x9chomexe2x80x9d are destinations enrolled by the user. During enrollment, the user speaks the phrase which will be used to designate a destination and provides a number, either through the telephone keypad or by speaking the numbers into the phone (which are recognized using speaker independent voice recognition). The phrase is digitally recorded by the voice activated dialing system and is also reduced to a template (typically using linear predictive coding) for speech recognition purposes.
After enrollment, if the user initiates a call, for example, xe2x80x9ccall the bossxe2x80x9d, the system will compare the utterance with all the voice dialing templates associated with that user to determine whether there is a match. If not, the system will ask the user to repeat the request. If a match is found, the system will respond with xe2x80x9ccalling the bossxe2x80x9d, where xe2x80x9cthe bossxe2x80x9d is repeated in the users own voice using the recording stored along with the template during enrollment. This confirmation gives the user an opportunity to abandon the call if the voice activated system does not properly recognize the command.
While the confirmation is an important feature of the voice activated dialing system, since the matching process is not perfect and incorrect matches can occur, its cost in storage space and time is significant. While the template is fairly small, typically 1000 bytes or less, a compressed recording requires a significantly larger amount of storage in order to retain enough speech data to re-create an acceptable recording for playback. Consequently, the amount of storage required for large voice activated template directories is not very large, but it is very large for the corresponding recordings. Frequently, because of their size, the recordings are not stored on the system doing the recognition, but must be downloaded from another system. This introduces call setup delays which are annoying to users.
Naturally, this is a significant problem when large directories are used. The problem increases when longer utterances are recorded to accompany the templates. For a 1.5 second recording at 16,000 bits/sec, 24,000 bits or 3000 bytes are needed. For a higher quality recording at 64,000 bits/sec, 12000 bytes are necessary for the same recording.
While voice activated dialing should be encouraged to promote efficiency in the workplace, particularly when work includes calling a large number of numbers, the costs of storage can mean that workers must reduce their use of voice activated dialing. Typically, workers will eliminate the least used numbers from their voice activated dialing lists; it is these numbers that are the least likely to be memorized, requiring the workers to look up the numbers, which is inefficient.
Therefore a need has arisen for a voice activated dialing system for large dialing directories with reduced storage requirements.
A voice activated dialing system includes a database maintaining a plurality of speech templates, associated telephone numbers, and recordings, wherein multiple speech templates can be linked to a single recording. Processing circuitry compares an utterance from a user to one or more of the templates from the database to find a matching template. When a match is found, the processing circuitry retrieves the telephone number associated with the matching template and the recording linked to the matching template. The recording is replayed to the user prior to initiating a connection to the associated telephone number.
The present invention provides significant advantages over the prior art. First, the storage requirements, and hence the cost, of the system can be greatly reduced because the number of recordings used in the system can be greatly reduced. Second, because the number of recordings is reduced, it is possible to store the recordings local to the processing circuitry, thereby increasing the responsiveness of the system.