Voice recognition (VR) technologies have been applied in a number of different business-related applications, including call centers. In a typical call center, an employee receives a telephone call from a customer who requests a purchase of one or more products and/or services. When making such orders, it is not unusual for a particular product or service to have various options associated with it. For example, when ordering a pizza, the size, crust, and toppings must be specified.
In order to eliminate live persons from telephone ordering systems, voice recognition technologies have been applied in which the customer talks solely with a voice recognition application that answers the customer's call. However, given today's technology, this presents a number of problems. An inaccurate transcription of voice to text can be particularly problematic in the context of an ordering system, since errors can have significant and immediate financial consequences. For example, if a customer intended to order fifteen large, thick-crust, sausage pizzas for an event, and the VR ordering system mistook “fifteen” as “fifty”, this would be unacceptable.
One of the problems with having VR call center/ordering systems is that VR systems can significantly improve their accuracy if the system has been trained to recognize a particular user's voice. There is a significant degree of variability between users' voices, and the system being trained for a particular user often results in a substantial accuracy improvement. However, with the VR call center/ordering system, many users that call in are first-time users, and thus there is no training data available on the VR system for the new caller.
Furthermore, it is unlikely that a new caller to an ordering system would be willing to spend the time and effort needed to train the VR system in recognizing the caller's voice—such training sessions can last ten to twenty minutes. Finally, the telephone frequency bandwidth is 300-3400 Hz. Since voice information usable for distinguishing among individuals exists outside of these frequencies, VR input and possible training information that is handled over the telephone will of necessity not be as effective or accurate as that which takes into account the full range of human voice.