The exemplary embodiment relates to natural language generation and finds particular application in connection with a system and method for generating natural language from a dialog act generated by an automated spoken dialog system.
Natural Language Generation (NLG) is an important component of most Spoken Dialog Systems. Given a dialog act, which is a semantic representation of a piece of dialog to be generated, the aim is to convert the dialog act into natural language form. Dialog systems are most useful when the natural language utterance generated is adequate, fluent and also has human-like naturalness. However, using an automated NLG component to generate well-formed speech can be challenging. For example, a dialog system could generate a dialog act such as: inform(name=‘hotel lakeside’;phone=‘9134623000’;postcode=‘64158’). Given such a dialog act, a human could generate: “The phone number of the hotel lakeside is 9134623000 and its postcode is 64158.” However, the dialog system may generate an incorrect or poorly worded output, such as: “hotel hotel lakeside is at phone 9134623000 at postcode 64158.”
Rule-based generators have been successful in some applications, but suffer from the problem of fixed, repetitive utterances, which are undesirable in NLG systems.
Recently, Neural Network (NN)-based approaches to Natural Language Processing (NLP) have been developed for applications in machine translation (Sutskever, et al., “Sequence to sequence learning with neural networks,” Advances in neural information processing systems (NIPS), pp. 3104-3112 (2014), hereinafter, “Sutskever 2014”), conversation modeling (Vinyals, et al., “A neural conversational model,” arXiv:1506.05869, pp. 1-8 (2015)), and sentiment classification and parsing (Tai, et al., “Improved semantic representations from tree-structured long short-term memory networks,” Proc. 53rd Annual Meeting of the ACL and the 7th Int'l Joint Conf. on Natural Language Processing, pp. 1556-1566 (2015)).
In particular, Recurrent Neural Network (RNN) based architectures, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) RNNs, are often used in language modeling tasks due to their ability to model sequential information with long range dependencies (Hochreiter, et al., “Long short-term memory,” Neural computation, 9(8):1735-1780 (1997), hereinafter, “Hochreiter 1997”; Cho, al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv:1406.1078, pp. 1-15 (2014), hereinafter, Cho 2014).
A RNN-based Natural Language Generation approach for spoken dialog systems has been proposed by Wen, et al., “Semantically conditioned LSTM-based natural language generation for spoken dialogue systems,” arXiv:1508.01745, pp. 1-11 (2015), hereinafter, “Wen 2015.” In this approach, a standard LSTM cell is augmented to process the input dialog in an unaligned manner to generate a corresponding utterance. A word-based generator uses words as the smallest token. However, this word-based model relies on pre-processing the original data where the named entities are substituted with placeholders, a process known as de-lexicalization. This is necessary because the word-level RNN is not able to “copy” the source words into the target, but has to learn a correspondence, which it can only do with a large amount of data.
Such an approach has drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. For example, languages whose verb forms depend on the gender-specific information present in the named entities cannot be generated correctly. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. For example, if the aim is to convey that two different hotels accept credit cards, the model has to de-lexicalize the names of the hotels using sub-categories such as: “NAME-1” and “NAME-2.” The model would therefore need to include categories within a placeholder to learn the interaction between them. Unknown words also pose problems.
There remains a need for a natural language generation method which addresses these problems.