1. Field of the Invention
The present invention relates to the field of automated speech systems and, more particularly, to inferring switching conditions for switching between modalities in a speech application environment extended for text-based interactive services.
2. Description of the Related Art
Interactive Voice Response (IVR) systems are often used to provide automated customer service via a voice channel of a communication network. IVR systems permit routine customer requests to be quickly, efficiently, and automatically handled. When a request is non-routine or when a caller has difficulty with the IVR system, a transfer can be made from the IVR system to a customer service representative. Even when human interactions are needed, the IVR system can obtain necessary preliminary information, such as an account number and a reason for a call, which can ensure callers are routed to an appropriate human agent and to ensure human-to-human interactive time minimized. Successful use of IVR systems allows call center to be minimally manned while customers are provided a high level of service with relatively low periods spent in waiting queues.
IVR systems, especially robust ones having natural language understanding (NLU) capabilities and/or large context free grammars, represent a huge financial and technological investment. This investment includes costs for purchasing and maintaining IVR infrastructure hardware, IVR infrastructure software, and voice applications executing upon this infrastructure. An additional and significant reoccurring cost can relate to maintaining a sufficient number of voice quality channels to handle anticipated call volume. Further, each of these channels consumes an available port of a voice server, which has a limited number of costly ports. Each channel also consumes a quantity of bandwidth needed for establishing a voice quality channel between a caller and the IVR system.
One innovative solution for extending an IVR infrastructure to permit text-based interactive services is detailed in co-pending patent application Ser. No. 11/612,996 entitled “Using an Automated Speech Application Environment to Automatically Provide Text-Based Interactive Services.” More specifically, the co-pending application teaches that a chat robot object, referred to as a Chatbot, can dynamically convert text received from a text-messaging client to input consumable by a voice server and can dynamically convert output from the voice server to text appropriately formatted for the client. From a perspective of the voice server, the text-based interactions with the text-messaging client are handled in the manner and with the same hardware/software that is used to handle voice-based interactions. The enhanced speech application environment allows for a possibility of switching between modalities, without interrupting a pre-existing communication session, which is elaborated upon in co-pending patent application Ser. No. 11/613,040 entitled “Switching Between Modalities in a Speech Application Environment Extended for Text-Based Interactive Services.”
Different advantages exist for a text-messaging modality and for a voice modality. In a text modality, for example, a user may have difficulty entering lengthy responses. This is particularly true when a user has poor typing skills or is using a cumbersome keypad of a resource constrained device (e.g., a Smartphone) to enter text. In a voice modality, a speech recognition engine may have difficulty understanding a speaker with a heavy accent, or who speaks with an obscure dialect. A speech recognition engine can also have difficulty understanding speech transmitted over a low quality voice channel. Further, speech recognition engines can have low accuracy when speech recognizing proper nouns, such as names and street addresses. In all of these situations, difficulties may be easily overcome by switching from a voice modality to a text messaging modality. No known system has an ability to switch between voice and text modalities during a communication session. Teachings regarding inferential modality switching are non-existent.