The present invention generally relates to learning systems used in modeling multi-party conversations, and more particularly relates to a deep learning method to select a response and an addressee of the response to respond to a speaker utterance in a multi-party conversation.
Understanding multi-party conversations is challenging because of complex speaker interactions: multiple speakers exchange messages with each other, playing different roles (sender, addressee, observer), and these roles vary across turns. Real-world conversations often involve more than two speakers.
In an UBUNTU Internet Relay Chat channel (IRC), for example, one user can initiate a discussion about an Ubuntu-related technical issue, and many other users can work together to solve the problem. Dialogs can have complex speaker interactions: at each turn, users can play one of three roles (sender, addressee, observer), and those roles vary across turns. To detect an addressee of an utterance and to predict a response utterance in multi-party conversations has been a very difficult technical problem to solve. For example, consider a case where a responding speaker is talking to two other speakers in separate conversation (also referred to as dialog) threads. The choice of addressee is likely to be either of the two other speakers. In the past, inconsistent selection of addressee-response pairs has often caused confusion and inaccuracy in tracking multi-party conversations.