Spoken-dialog or natural language interaction systems for applications such as customer care increasingly use statistical language models and statistically-based semantic processing for recognition and analysis of utterances. The design and deployment of these dialog systems involves significant data collection in order to provide a corpus that is representative of the intended service and sufficiently large for development, training and evaluation. Consequently, the development of these natural language spoken dialogue systems requires a large database of annotated speech utterances that adequately represent the way humans interact with the system. The speech utterances collected for use in training dialog systems should not be collected via human-human interactions, as research has shown that human-human interactions are very different from human-machine interactions in terms of language characteristics and linguistic behavior.
A methodology referred to as the Wizard-of-Oz (WOZ) methodology has been used extensively as a method of collecting high-quality, machine-directed speech data for use in training dialog systems. The WOZ approach uses a hidden human agent or customer service representative to simulate the behavior of the dialog system so that the callers believe they are interacting with a dialog system. Best practices dictate, however, that thousands or tens of thousands of utterances need to be collected and transcribed in order to achieve a decent coverage in speech recognition and spoken language understanding in natural language dialog systems. Moreover, the WOZ approach does not scale in terms of cost and time needed to complete collection of the necessary data. Other concerns with the WOZ approach include its lack of realism because, in WOZ simulations, both the subjects and the wizard(s) are playing roles. The wizard, who is played by the researcher interested in collecting “natural” user utterances, is playing the role of a dialog system. The subjects, because they are taking part in a scientific experiment, are playing the role of real users performing real tasks in a real world setting.
Conventional data collection systems referred to as “ghost wizard” systems have also been proposed for collecting corpus data. A typical ghost wizard system plays an open prompt to callers, receives one caller utterance, plays another prompt to the caller saying the system did not understand the received utterance, receives yet another caller utterance, and then transfers the call to a human operator. The ghost wizard systems thus achieve data collection at the cost of negative caller experiences, as the callers are forced to repeat their requests. In addition, ghost wizard systems cannot be used in collecting follow-up dialogs, as they can only be used at the beginning of a conversation.
Another conventional WOZ approach involves a data collection system for call-routing applications deployed in an actual dialog system environment. This conventional system is configured to allow a customer service representative to work on or use a WOZ interface to produce machine-generated voice responses to the callers, giving users an impression of human-machine interaction, while routing the calls correctly, thus achieving real-world data collection without compromising user experiences. However, this conventional system is domain specific and tightly tied to the call-routing domain.
In contrast to the conventional systems, there is a need for a generic framework and automated approach for generating in-service data collection interfaces for operator-involved partially or even fully automated dialog systems, which can be applied to any domain using natural language dialog systems without interrupting the ongoing natural workflow between real callers and operators. In addition, there is a need for an interactive system that automatically logs all data between callers and operators (e.g., wizards) in real-time during the human-machine interaction and automatically annotates such data at various dialog module levels.