1. Field of the Invention
The present invention relates to speech recognition methods and systems and more particularly to methods and systems whereby a speech recognition system is automatically configured. In an exemplary embodiment, the invention provides a method and system whereby a speech recognition unit within a speech recognition system is automatically configured with speech models and parameters associated with a particular user device.
2. Description of Related Art
Network-based speech recognition is used by customers for many tasks, including placing calls by speaking phrases. For example, a customer might say xe2x80x9cCall John Smith,xe2x80x9d and the speech recognition system places a phone call to John Smith. Alternatively, a customer might say xe2x80x9cDial 555-1234,xe2x80x9d which causes that telephone number to be dialed. Speech recognition can be used in conjunction with other services as well. For example, a customer might want to retrieve and manage his or her voicemail messages by speaking certain phrases. In addition, a customer might use speech recognition to access his or her records in a financial institution and retrieve account information and balances.
Speech recognition has important advantages for customers of telecommunication services. For example, customers need no longer consult a telephone book or list to match a telephone number with a particular name. The customer need only say the name, and the number is automatically dialed. In addition, a customer stating the number will have the number automatically dialed thereby eliminating the possibility of misdialing the number.
Conventional speech recognition systems comprise modules to recognize speech phrases. They also contain a database where speech models are stored. A speech recognition algorithm uses speech models and other parameters stored in the database to recognize voice messages. Speech models are created by recording thousands of speech utterances from human subjects. Each speech model is a mathematical model of a particular sound or collection of sounds in the language of these utterances. For example, a speech model can be created for each phoneme in the language, or for each of the xe2x80x9cdiphonesxe2x80x9d (i.e., two-phoneme groupings) in a language, or for larger sets of phoneme groupings such as xe2x80x9ccommand wordsxe2x80x9d or xe2x80x9ccommand phrasesxe2x80x9d that need to be recognized by a speech recognition system.
Different types of devices are used by customers to send messages to speech recognition systems. For example, the customer may state a phrase through a telephone handset that is connected to a landline telephone line. Alternatively, the customer may place a call by using a hands-free device such as a speaker phone. Different technologies are also used for various user devices. For example, the customer may use a wireless CDMA handset, or a wireless CDMA speaker phone from within the mobile vehicle.
Speech recognition systems also have other features with adjustable parameters. For example, a xe2x80x9cbarge-inxe2x80x9d feature refers to the situation where a user speaks a command when an announcement is playing over the phone. The xe2x80x9cbarge-inxe2x80x9d feature stops the announcement and the command is then recognized by the speech recognition system. The xe2x80x9cbarge-inxe2x80x9d feature typically has certain configurable parameters. For instance, for barge-in, the length and volume of sound energy may be configurable parameters.
The quality of voice communication varies based on type of user device and the technology used with a particular user device. For example, a handset in which the microphone is held within a few inches of the speaker""s mouth will have a different sound quality than a device with a more distant microphone such as a speaker phone.
The technology used and the medium over which the transmission is sent also affects the quality of reception. For example, the technology associated with a landline phone offers a different sound quality than the technologies associated with a digital PCS call or analog cellular phone call. In addition, the environment associated with the wireless units may provide more interference and background noise than the environment associated with the landline unit.
Because of these differences in quality of reception between various units, the inventor has discovered that the speech models and parameters that achieves high-quality recognition for one type of device, technology, or environment may not achieve high quality recognition if used to recognize speech for other types of devices, other technologies, or other environments. For example, speech models appropriate for a landline device may not be appropriate for a hands-free unit. Additionally, speech models that closely matches the attributes for a CDMA device may be inappropriate for a non-CDMA device.
In addition, because of these differences in quality between various units, the parameters associated with a particular feature for different device types may be different. For instance, the models associated with barge-in that are appropriate for a landline device may not be appropriate for a hands-free unit. Similarly, the parameters associated with barge-in that are appropriate for a CDMA device may be inappropriate for a non-CDMA device.
The present invention provides a method and system whereby a user device having an associated device type sends a message with this device type to a network-based speech recognition system. The network-based speech recognition system is then optimized for the particular user device using speech models and parameters associated with the device type.
In one embodiment of the present invention, a user device sends an initial message to a speech recognition system. Examples of user devices include landline handsets, landline speaker phones, CDMA handsets, and CDMA speaker phones. The speech recognition system responds to the initial message from the user device with an acknowledgement. In response to the acknowledgement, the user device transmits a message to the speech recognition system describing its device type. For instance, if the user device were a speaker phone using CDMA technology, a message including this information would be sent to the speech recognition system.
The speech recognition system includes a speech recognition unit. The speech recognition unit obtains the speech models and parameters associated with the device type from a database. The speech recognition unit receives these models and parameters from the database and is configured with them. The user device then transmits, and the speech recognition unit receives, a voice message. The speech recognition unit uses the configured models and parameters to process the voice messages from the user device. Thus, the method and system can automatically configure a speech recognition unit within a speech recognition system with speech models and parameters associated with a particular user device.
In another embodiment of the present invention, the speech recognition system includes a services module. The services module is activated by the user""s voice commands. For example, the services module may automatically outdial phone calls. Thus, if the command requests that a particular call be outdialed, the services module outdials the correct call.
In yet another embodiment, the voice command requests that a service be accessed by the speech recognition system. For example, the voice message may request that a voicemail system be accessed or that account balances from a financial institution be retrieved.
These as well as other features and advantages of the present invention will become apparent to those of ordinary skill in the art by reading the following detailed description, with appropriate reference to the accompanying drawings.