The exemplary embodiment relates to natural language understanding and finds particular application in connection with a system and method for predicting canonical forms for natural language text.
Semantic Parsing, as used herein, refers to techniques for learning how to map natural language utterances into logical representations that can be operationally interpreted. Mapping natural language utterances to logical forms is useful in various applications involving natural language understanding, such as in the automation of call-centers and in question-answering systems. See, for example, Zettlemoyer, et al., “Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars,” Proc. 21st Conf. in Uncertainty in Artificial Intelligence (UAI '05), pp. 658-666, 2005; Berant, et al., “Semantic parsing via paraphrasing,” ACL (1), pp. 1415-1425, 2014; Kwiatkowski, et al., “Scaling semantic parsers with on-the-fly ontology matching,” Proc. 2013 Conf. on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 1545-1556, 2013; Artzi, et al., “Weakly supervised learning of semantic parsers for mapping instructions to actions,” Trans. ACL, 1(1):49-62, 2013. In question answering, for example, the goal is to be able to process a complex question formulated in natural language, map it into a logical representation, and then retrieve an answer to that question from a Knowledge Base.
Several approaches use paraphrases to build a semantic parser. See, for example, Fader, et al., “Paraphrase-driven learning for open question answering,” Proc. 51st Annual Meeting of the ACL (Vol. 1: Long Papers), pp. 1608-1618, 2013; Berant, et al., “Semantic parsing on freebase from question-answer pairs,” Empirical Methods in Natural Language Processing (EMNLP). vol. 2, no. 5, p. 6, 2013; Bordes, et al., “Open question answering with weakly supervised embedding models,” Machine Learning and Knowledge Discovery in Databases, Vol. 8724 of the series Lecture Notes in Computer Science, pp. 165-180, 2014. These methods typically use paraphrases to learn useful lexical features or to improve sentence embeddings.
Recently an approach for quickly developing semantic parsers for new knowledge bases and domains when no training data initially exists was proposed (Wang, et al., “Building a semantic parser overnight,” Proc. 53rd Annual Meeting of the ACL and 7th Intl Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL 2015), Vol. 1: Long Papers, pp. 1332-1342, 2015, hereinafter, Wang 2015). In this approach, referred to herein as SPO, a small generic grammar is used to generate so-called canonical (textual) forms and pair them with logical forms. Crowdsourcing is then used to paraphrase these canonical forms into natural utterances. The crowdsourcing thus creates a dataset consisting of (u,c,lf) tuples where u is a natural language utterance, c is its canonical form and lf is the logical form associated with c by the grammar. Finally, a semantic parser is learnt over this dataset. In the method of Wang 2015, SPO parses a natural utterance by first retrieving a list of possible logical forms and then learning to rank those. The performance for the SPO method, however, has not been good overall, with an accuracy of less than 50% reported for several domains. In this context, oracle accuracy is the accuracy that the retrieved list effectively contains one correct logical form and ranking accuracy is the accuracy of ranking the correct logical form in first position. In practice, this performance may be due to low oracle accuracy as the retrieved list of logical forms often does not contain the correct one.