Aspects of the exemplary embodiment relate to extracting graphs of relations from texts, and find particular application in connection with a system and method for extracting graphs based on an ontology of relations.
Graph extraction (GE), as used herein, entails the extraction of explicitly-structured sets of relations and attributes that link concepts and/or entities that either have (explicit) mentions, or which are implicitly referred to, in the input texts. GE can be useful for various applications, including automatic or semi-automatic population of databases (DBs) or knowledge bases (KBs), or for the generation of formal database or knowledge base queries from requirements or queries expressed in natural language.
It is widely acknowledged that annotating relations in the text is costly, and a number of attempts have been made to develop unsupervised or weakly supervised methods. In particular, some approaches try to exploit ontological constraints for reducing supervision. These approaches consider pairwise relations (a pair of mentions connected with a certain type of relation), as opposed to a graph of relations, and require examples for learning, either in the form of graph annotations in texts (regular supervision), or in the form of relations stored in a database or a knowledge base.
For example, in the Never-Ending Learning framework, the authors propose a semi-supervised method to learn simultaneously the categories and relations between categories from the web documents (see, T. Mitchell, et al. “Never-Ending Learning,” Proc. 29th AAAI Conf. on Artificial Intelligence (AAAI-15), 2015; and Andrew Carlson, et al., “Coupled Semi-Supervised Learning for Information Extraction,” Proc. 3rd ACM Int'l Conf on Web search and data mining (WSDM '10), pp. 101-110, 2010). This approach can extract only pairwise relations and requires a set of seed samples, since the approach is semi-supervised. Using a large corpus of web documents, the method relies on the existence of repeated patterns in the web documents in order to learn the most accurate relation patterns. It is thus not directed towards processing single documents.
In the case of extracting graphs of relations, existing systems are either rule-based or fully supervised. For example, a rule-based method is described in Nikolai Daraselia, et al., “Extracting human protein interactions from MEDLINE using a full-sentence parser,” Bioinformatics, 20(5):604-611, 2004. In this method, an ontological graph is generated from a rich semantic parse of the sentence. The semantic parser is highly domain dependent, and closely related to the ontological structure to be produced. Thus, for each new domain, a semantic parser need to be generated, which is time consuming. A fully supervised method is described in Andrew MacKinlay, et al., “Extracting Biomedical Events and Modifications Using Subgraph Matching with Noisy Training Data,” Proc. BioNLP Shared Task 2013 Workshop, pp. 35-44, 2013. The method aims to extract complex events in the context of BioNLP-2013 Cancer Genetics task. Syntactic dependency subgraph patterns that describe various events are learned from annotated training data. The extraction of new events is done using approximate subgraph matching between the learned subgraph patterns and the one observed in the input.
A method for statistical entity extraction from the Web, including the extraction of attributes of the entities, is described in Zaiqing Nie, et al., “Statistical Entity Extraction From the Web,” Proc. IEEE, 100(9):2675-2687, 2012. The method extracts trees based on the definition of the entities and attributes to extract. However, the method requires training examples and extracts relations/attributes only for explicit text mentions.
A method for Abstract Meaning Representation (AMR) parsing is described in Keenon Werling, et al., “Robust Subgraph Generation Improves Abstract Meaning Representation Parsing,” Proc. 53rd Annual Meeting of the ACL and the 7th Int'l Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing, Vol. 1: Long Papers, pp. 982-991, 2015. The supervised method uses annotated training examples and outputs a graph of nodes linked with relations. It only extracts relations/attributes for explicit text mentions.
Hoifung Poon, “Grounded Unsupervised Semantic Parsing,” ACL (1), pp. 933-943, 2013, describes a method for “grounded unsupervised semantic parsing” for translating natural language queries into database queries. The method requires distant supervision and therefore requires labeled examples in a database for training. Once again, the method extracts relations/attributes only for explicit text mentions.