The exemplary embodiment relates to natural language processing and finds particular application in connection with a system and method for prediction of structured forms based on natural language utterances.
Mapping natural language utterances (NLU) to logical forms (LF), a process known as semantic parsing, has various applications, such as in the building of Question-Answering systems (Tom Kwiatkowski, et al., “Scaling Semantic Parsers with On-the-Fly Ontology Matching,” Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1545-1556, 2013; J. Berant et al., “Semantic Parsing on Freebase from Question-Answer Pairs,” EMNLP, pp. 1533-1544, 2013; J. Berant, et al., “Semantic Parsing via Paraphrasing,” Association for Computational Linguistics (ACL), pp. 1415-1425, 2014). In question answering, the goal is to be able to process a question formulated in natural language, map it into a logical form, and then retrieve an answer to that question from a knowledge base.
Difficulties arise when the natural language utterance is a semantically complex question, leading to a logical form query with a fair amount of compositionality (Panupong Pasupat, et al., “Compositional Semantic Parsing on Semi-Structured Tables,” ACL (1) 1470-1480, 2015).
Methods for building semantic parsers are described, for example, in Wang, et al., “Building a semantic parser overnight,” Proc. 53rd Annual Meeting of the ACL and 7th Intl Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL 2015), Vol. 1: Long Papers, pp. 1332-1342, 2015, hereinafter, Wang 2015). A small generic grammar is used to generate so-called canonical (textual) forms and pair them with logical forms. Crowdsourcing is then used to paraphrase these canonical forms into natural utterances. The crowdsourcing thus creates a dataset (referred to herein as the SPO dataset) consisting of (NL, CF, LF) tuples where NL is a natural language utterance, CF is its canonical form and LF is the logical form associated with CF by the grammar. A semantic parser is then learnt over this dataset.
Wang 2015 learns a semantic parser on this dataset by firstly learning a log-linear similarity model based on a number of features (word matches, ppdb matches, matches between semantic types and POS, etc.) between the NL and the correct (CF, LF). At decoding time, a natural utterance NL is parsed by searching among the derivations of the grammar for one for which the projected (CF, LF) is most similar to the NL based on the log-linear model. The search is based on a so-called “floating parser,” as described in Panupong Pasupat, et al., “Compositional Semantic Parsing on Semi-Structured Tables,” arXiv:1508.00305, 2015, which is a modification of a standard chart-parser, which is able to guide the search based on the similarity features.
Although the parser used in Wang does not have good accuracy in many domains, the crowdsourcing approach has proved useful in generating training data, as described in copending application. Ser. No. 14/811,005, discussed below.
Recurrent Neural Networks (RNNs) have proved effective in some natural language processing (NLP) applications. For example, Long Short-Term Memory networks (LSTMs) have been used for performing sequence prediction in NLP applications, such as translation and natural language generation (Sepp Hochreiter, et al., “Long Short-Term Memory,” Neural Computation, 9(8):1735-1780, 1997; Ilya Sutskever, et al., “Sequence to Sequence Learning with Neural Networks,” Advances in Neural Information Processing Systems (NIPS), pp. 3104-3112, 2014; Tsung-Hsien Wen, et al., “Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems,” EMNLP, pp. 1711-1721, 2015). These approaches, however, try to predict intrinsically sequential objects (texts), whereas a logical form is a structured object that is tree-like by nature and also has to respect certain a priori constraints in order to be interpretable against a knowledge base.