The following relates to the dialog system arts, customer support arts, call center arts, and related arts.
A dialog system is based on, or augments, a natural language interfacing device such as a chat interface or a telephonic device. A chat interface is typically Internet-based and enables a chat participant to enter and receive textual dialog utterances. Each participant operates his or her own chat interface, and typed or dictated utterances from each participant are displayed on both (or all) participants' chat interfaces, so that the participants can conduct a dialog. The chat interfaces may support other content exchange, e.g. one participant may present a (possibly multi-media) document that is displayed on both (or all) participants' chat interfaces. Other examples of natural language interfacing devices include telephonic devices such as conventional telephones or cellular telephones, or tablet computers, notebook computers, or the like having appropriate audio components (microphone and earphone or loudspeaker, or full audio headset). The telephonic device may provide audio-only interfacing, or the natural language dialog may be augmented by video (e.g., the telephonic device may be a video conferencing device).
Such dialog systems are ubiquitous and widely used for diverse applications. One common application is to provide customer support for customers, clients, users, or the like. In, a typical arrangement, the support provides a call center to which a customer, client, or the like calls. The call center is staffed by call center agents who handle customer calls. Ideally, the call center agents have expertise in the product, service, or the like for which support is to be provided. In practice, however, call center agents have limited knowledge and expertise, which can limit their effectiveness.
One way to improve call center quality is through the use of a semi-automated or fully automated dialog system. In a semi-automated dialog system, the ongoing dialog is recorded and processed to predict an appropriate current utterance for the call center agent to express. The current utterance, or a list of current utterance candidates, is typically displayed on a display component of the natural language interfacing device. For example, if the interfacing device is a chat interface comprising computer with a connected headset running chat software, then the list of current utterance candidates may be displayed in a suggestions window shown on the computer display. The call center agent can refer to this list and may choose one of the utterance candidates for responding to the caller (either verbatim or with edits made by the call center agent). Such an approach is semi-automated since the dialog system generates suggested utterances but the call center agent conducts the actual dialog. In an alternative, fully automated approach, the dialog system chooses a single “best” utterance and actually expresses that utterance to the caller, e.g. by automatically generating a typed response via a chat interface, or by operating a speech synthesizer to automatically “speak” the utterance to the caller at the opposite end of a telephonic call.
Dialog systems can advantageously support a call center agent by providing response suggestions (semi-automated systems) or can replace the call center agent entirely by acting as a “virtual agent” (fully automated systems). However, for customer or client support systems or other dialog systems for knowledge domain-specific applications, a complication arises. The utterances generated by the dialog system are generally expected to exhibit expertise in a relatively narrow knowledge domain. For example, a dialog system of a call center maintained by an electronics retailer may be expected to provide expert advice regarding various consumer electronic devices, such as various makes/models of cellular telephones, computers, or so forth. Thus, the utterances generated by the dialog system should be correct as to information within this knowledge domain and pertinent to the current dialog point. At the same time, the utterances generated by the dialog system should be effective natural language communication, for example employing proper vocabulary and grammar, and correctly using common phrases, and so forth.
A common dialog system architecture includes a natural language understanding (NLU) component, a natural language generation (NLG) component, and a “central orchestrator” usually referred to as a dialog manager (DM), which takes input from the NLU component, updates an internal state, consults a Knowledge Base to decide on a next dialog action (DA) to take, and communicates this DA to the NLG component which generates a natural language utterance implementing the DA. Generating a cohesive integration of these various components for supporting or engaging in domain-specific dialog is difficult. The NLU component can encapsulate domain-specific expertise, but the NLG component is not domain-specific and as such may improperly handle a domain-specific DA that is output by the NLU component. It is challenging to effectively train the NLU/DM/NLG chain to consistently communicate domain-specific information via natural language utterances that conform to proper vocabulary, grammar, and other natural language expression rules.