The variability of human languages poses a challenge to human-machine communication: different forms correspond to and may approximate the same meaning. For example, a question answering system capable of answering the question “The GDP of which country has risen steadily since 2012?” might fail in answering the question “Which country's GDP has continued to increase since 2012?” although to humans, these two are obviously similar. The fields of paraphrase mining (extracting of equivalent pairs from text) and paraphrase generation (generating paraphrase pairs based on rules or machine learning models) have tried to address this challenge.
Generating and identifying paraphrases is an important part of a large number of Natural Language Processing (NLP) tasks, due to the variability of language. Indeed, NLP systems must be able to handle this variability as users can express the same or similar concepts in a number of different ways. Inspired by the success of neural machine translation, there has been a number of studies investigating the effectiveness of sequence-to-sequence models for paraphrase generation. The focus of these works has been the exploration of different neural network architectures, using multiple layers of recurrent cells, using residual connections between layers, and drop-out.
Many NLP tasks, including automatic question answering and summarization or text generation, can benefit from the recognition or generation of paraphrases. For example, we can answer “How hot is it in Singapore in fall?” if we can relate it to the question “What is the temperature in Singapore in autumn?” and the answer to the latter is known. In a dialogue system, generating paraphrases of the system utterances may lead to more natural interaction. Recently, neural machine translation techniques based on sequence-to-sequence (seq2seq) models have been applied to the task of paraphrase generation with promising results.
Paraphrases are phrases or sentences that use different words or syntax to express the same or similar meaning (Rahul Bhagat and Eduard Hovy. 2013. What Is a Paraphrase? Computational Linguistics 39 (2013), 463-472. Issue 3.). Given the variability of language, generating paraphrases for an input phrase or sentence is an important part of a number of Natural Language Processing (NLP) and Information Retrieval (IR) tasks. For example, if we can identify that a question submitted to a Question Answering (QA) system is a paraphrase of another question with known answer, then the QA system can answer the first question as well. In the context of automatic query expansion, paraphrases can be used as alternative query reformulations (Karen Spärck Jones and John I. Tait. 1984. Automatic Search Term Variant Generation. Journal of Documentation 40, 1 (1984), 50-66.). Madnani and Dorr provide an overview of data-driven paraphrase generation (Nitin Madnani and Bonnie J. Dorr. 2010. Generating Phrasal and Sentential Paraphrases: A Survey of Data-driven Methods. Comput. Linguist. 36, 3 (2010), 341-387.).
One approach to paraphrase generation is to consider it as a monolingual translation task, where an input phrase is translated to its paraphrases. (Chris Quirk, Chris Brockett, and William B. Dolan. 2004. Monolingual Machine Translation for Paraphrase Generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP '04). 142-149. Sander Wubben, Antal van den Bosch, and Emiel Krahmer. 2010. Paraphrase Generation As Monolingual Translation: Data and Evaluation. In Proceedings of the 6th International Natural Language Generation Conference (INLG '10). 203-207.) Since the recent advances in neural machine translation and sequence-to-sequence models (Dzmitry Bandanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR). Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS '14). 3104-3112.), there has been a number of works where neural machine translation models have been applied to the task of paraphrase generation (Sadid A. Hasan, Bo Liu, Joey Liu, Ashequl Qadir, Kathy Lee, Vivek Datla, Aaditya Prakash, and Oladimeji Farri. 2016. Neural Clinical Paraphrase Generation with Attention. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP). 42-53. Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2017. Paraphrasing Revisited with Neural Machine Translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (EACL '17). 881-893. Aaditya Prakash, A. Hasan, Sadid, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, and Oladimeji Farry. 2016. Neural Paraphrase Generation with Stacked Residual LSTM Networks. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING '16). 2923-2934.). However, a shortcoming of these works is they have focused on exploring alternative neural network configurations and have only employed the surface form of words, ignoring any syntactic information, such as Part-of-Speech (POS) tags.
Accordingly, one key problem associated with these approaches is that they have only relied on words, ignoring any syntactic information, such as Part-of-Speech (POS) tags, grammatical functions, or named entities. Existing models only incorporate word strings and ignore linguistic meta-data.
Mining and Generating Paraphrases. Lin and Pantel (Dekang Lin and Patrick Pantel. 2001. DIRT—Discovery of Inference Rules from Text. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01). 323-328.) described DIRT (Discovery of Inference Rules from Text), a method to mine paraphrase pairs, which they called inference rules, from text using a dependency representation. Sentences are parsed using Lin's rule-based Minipar parser, and the near meaning equivalence of two phrases is then established via a similarity comparison of their dependency paths and using mutual information. Lin and Pantel's work was the first theory-agnostic, large-scale attempt to extract paraphrase pairs.
Pasça and Dienes (Marius Paşca and Péter Dienes. 2005. Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web. In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP '05). 119-130.) presented a method for mining paraphrases from a Web crawl by aligning n-gram fragments bounded by shared anchor n-grams on either side. The method is notable for working with noisy input text, for not requiring a parallel corpus, and for relying only on a sentence splitter and a POS tagger. The length of the fixed anchors on either side determine the specificity of the extracted paraphrases. From a crawl of 972 m Web pages, depending on settings between 13.9 k and 41.7 m paraphrase pairs are extracted. The precision of paraphrases was not evaluated for a random sample in a component-based evaluation as one would expect; instead, the top-100, middle 100 and bottom 100 paraphrases of a run were evaluated. An additional task-based evaluation on date related questions from the TREC QA track showed 10-20 points of improvements in mean reciprocal rank of the correct answer.
The RTE (Recognizing Textual Entailment) shared task held seven times to date first as part of the PASCAL initiative and now as part of the Text Analysis Conference (TAC) has investigated the efficacy of the research question of entailment recognition based on short text fragments (Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL Recognising Textual Entailment Challenge. In Proceedings of the 1st International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment (MLCW '05). 177-190.). Entailment can be seen as a unidirectional version of the paraphrase relation. Zhao, Wang and Liue (Shiqi Zhao, HaifengWang, and Ting Liu. 2010. Paraphrasing with Search Engine Query Logs. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10). 1317-1325.) described one of the earliest attempts to mine paraphrases from search query logs. Biran, Blevins and McKeown mined templatized (i.e., slightly abstracted) sentential paraphrase sets with typed variable slots) from plain text. Madnani and Dorr (Nitin Madnani and Bonnie J. Dorr. 2010. Generating Phrasal and Sentential Paraphrases: A Survey of Data-driven Methods. Comput. Linguist. 36, 3 (2010), 341-387.). provided a survey of techniques for generating paraphrases from data. Androutsopoulos and Malakasiotis also survey methods for paraphrasing and textual entailment recognition, generation and extraction. More recently, Yin et al. modelled sentence pairs with an attention-based Convolutional Neural Network (AB-CNN). The three models are utilized in the tasks of Answer Selection (AS) in question answering, paraphrase identification (PI) and textual entailment (TE).
Paraphrasing as Machine Translation. In the context of machine translation, there have been attempts to extract paraphrase pairs from parallel corpora or other translation pairs. Quirk, Brockett and Dolan (Chris Quirk, Chris Brockett, and William B. Dolan. 2004. Monolingual Machine Translation for Paraphrase Generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP '04). 142-149.) investigated the use of statistical machine translation to generate new paraphrases for input sentences, based on a large number of aligned sentence pairs extracted from news articles on the Web. Bannard and Callison-Burch (Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with Bilingual Parallel Corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL '05). 597-604.) extracted paraphrases from a bilingual parallel corpus by treating paraphrases as pairs of phrases in one language that are aligned with the same phrase in the second language. In the context of Natural Language Generation (NLG), Wubben, Van Den Bosch and Krahmer generated sentential paraphrases using statistical machine translation trained on a corpus of news headlines and demonstrated improved performance compared to a word substitution method.
Encoder-Decoder Models. The introduction of encoder-decoder models to machine translation has led to important improvements in the quality of machine translation. The basic idea of encoder-decoder models is that the decoder, which corresponds to a neural language model, generates the output sequence starting from an initial state that corresponds to the hidden state of the encoder, which has processed the input sequence. Kalchbrenner and Blunsom (Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent Continuous Translation Models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP '13). 1700-1709.) introduced a model where the encoder and the decoder were a convolutional neural network and a recurrent neural network, respectively. Sutskever, Vinyals and Le (Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS' 14). 3104-3112.) introduced the sequence-to-sequence model, which uses for both the encoder and decoder a recurrent neural network with LSTM units. They also found that reversing the input led to improved performance, possibly due to the close alignment between source and target languages in the machine translation task they evaluated the model on (translating French to English). As a result, the initial words in the input sequence can have a higher impact on the encoder's hidden state, which is then used to condition the decoder's generated sequence. Bandanau, Cho and Bengio (Dzmitry Bandanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR).) used a bi-directional RNN to model the encoder, and introduce an attention mechanism, which generates one vector representation for each word in the input sequence, instead of encoding the whole input with just one vector. Luong, Pham and Manning refined the idea of attention by computing attention from a window of words, instead of using all of the input words, and concatenate the attention vectors with the hidden state vectors of the decoder before predicting the words of the output sequence. Neubig (Graham Neubig. 2017. Neural Machine Translation and Sequence-to-sequence Models: A Tutorial. arXiv preprint arXiv: 1703.01619 (2017). offered a recent and comprehensive overview of sequence-to-sequence models for machine translation.
Neural machine translation models have also been applied to the task of paraphrase generation. Hasan et al. (Sadid A. Hasan, Bo Liu, Joey Liu, Ashequl Qadir, Kathy Lee, Vivek Datla, Aaditya Prakash, and Oladimeji Farri. 2016. Neural Clinical Paraphrase Generation with Attention. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP). 42-53.) used a bi-directional RNN for encoding and a RNN for decoding paraphrases, with the attention mechanism by Bandanau, Cho and Bengio (Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR). They also represented input and output as sequences of characters, instead of word sequences. Prakash et al. (Aaditya Prakash, A. Hasan, Sadid, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, and Oladimeji Farry. 2016. Neural Paraphrase Generation with Stacked Residual LSTM Networks. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING '16). 2923-2934.). used a similar architecture with more layers and residual connections, where the input of one layer is also made available to the above layer. Mallinson, Sennrich and Lapata (Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2017. Paraphrasing Revisited with Neural Machine Translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (EACL '17). 881-893.) applied the bilingual pivoting approach proposed by Bannard and Callison-Burch (Cohn Bannard and Chris Callison-Burch. 2005. Paraphrasing with Bilingual Parallel Corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL '05). 597-604.) with neural machine translation, where the input sequence is mapped to a number of translations in different languages, and then these translations are mapped back to the original language. One key shortcoming to all these approaches is that none of these works used additional linguistic information, such as POS tags to enhance the quality of generated paraphrases. An additional problem is the encoder-decoder models used randomly initialized embeddings.
What is needed is a system adapted to consider and synthesize syntactic information, such as linguistic or Part-of-Speech (POS) tags, grammatical functions, or named entities, in conjunction with neural network architectures to improve performance, especially in connection with paraphrase generation. A system is needed that will use pretrained values to populate one or more embedding matrix at initialization.