A voice portal is a system which can be accessed entirely by voice. Ideally, any type of information, service, or transaction found on the Internet could be accessed through a voice portal. A mobile user with a cellular telephone might dial in to a voice portal application and request information using voice or touchtone keys and receive the requested information either via audible prompt, via synthesized text to speech or via some form of display (text message, window pop up on a PDA, etc. Depending on the user's needs, voice portals may automate phone based access to an information resource—from email systems, internet sites, databases, flat files, etc—or it may assist in routing a caller to a specific human resource—for example, someone in customer service.
There are two major categories of voice portals—consumer voice portals and enterprise voice portals. Consumer voice portals focus on giving the user access to information which is general in nature such as weather, sports scores and stock quotes. This information can generally be accessed via Internet protocols such as http and web services. Enterprise voice portals provide customized access to information more useful to employees such as email, calendaring, inventory levels, etc. Both types of voce portals can also route calls to human resources as well. In the case of a consumer oriented voice portal, the person may be a customer support or sales person, in the case of an enterprise voice portal this may be a college or sales lead. Both types rely heavily on resources such as ASR (automatic speech recognition) and TTS (text to speech).
Enterprise voice portals typically interoperate with the enterprise PBX and may use communication protocols specific to a particular PBX or call center. Hosted versions may use SS7 or IMS (IP Multimedia Subsystem) to ease integration issues and provide support for TDM (time division multiplexing) and VoIP (Voice over Internet protocol). A conventional voice portal may include one or more Automated Speech Recognition (ASR) systems and/or one or more Text-To-Speech (TTS) systems.
While the following description gives examples using resources such as ASR resources and TTS resources, the present invention should not be limited to only ASR and TTS resources. Other resources, including but not limited to, video resources, speaker verification resources, telephony ports and network bandwidth are also within the scope of the present invention.
ASR is a technology that allows users of information systems to speak entries rather than punching numbers on a keypad. ASR is can be used in place of keypad entry but is practically required whenever the input data would is complex in nature. In recent years, ASR has become popular in the customer service departments of large corporations ASR is also used by some government agencies and other organizations. Basic ASR systems recognize single-word entries such as yes-or-no responses and spoken numerals. This makes it possible for people to work their way through automated menus without having to enter dozens of numerals manually or when a hands free interface is required. In a manual-entry situation, a customer might hit the wrong key after having entered several numerals at intervals previously in the menu, and give up rather than call again and start over. ASR virtually eliminates this problem.
Sophisticated ASR systems allow the user to enter direct queries or responses, such as a request for driving directions or the telephone number of a hotel in a particular town. This shortens the menu navigation process by reducing the number of decision points. It also reduces the number of instructions that the user must receive and comprehend. For institutions that rely heavily on customer service, such as airlines and insurance companies, ASR makes it possible to reduce the number of human call-center employees. Those people can then be trained for other jobs that are more profitable and interesting, such as complaint resolution, customer retention, or sales.
The technology of speech recognition has been around for some time. It is improving, but problems still exist. An ASR system cannot always correctly recognize the input from a person who speaks with a heavy accent or dialect, and it has major problems with people who combine words from two languages by force of habit. Marginal cell-phone connections can cause an ASR system to misinterpret the input.
TTS is a type of speech synthesis application that is used to create a spoken sound version of the text in a computer document, such as a help file or a web page. TTS can enable the reading of computer display information for the visually challenged person, or may simply be used to augment the reading of a text message. Current TTS applications include voice-enabled e-mail and spoken prompts in voice response systems. TTS is often used with voice recognition programs.