The Semantic Web provides a large amount of structured interconnected data. The richness of this data provides new possibilities for research and industry, and opens up new approaches in the human computer interaction. While more and more Resource Description Framework (RDF) data is contributed to the Semantic Web, questions arise on how the user may access this body of knowledge in an intuitive way. In this context, Linked Data driven question answering systems have caught much attention most recently, as these systems allow users, even with a limited familiarity of technical systems and databases, to pose questions in a natural way and gain insights of the data available.
One of the challenges in question answering using RDF-based data is the automatic mapping of natural language questions onto appropriate SPARQL query representation, which subsequently connects a number of interlinked RDF repositories (i.e., translating the individual parts of a natural language question to respective URI representations, as needed for the underlying query language).
For example, when trying to answer the question “What are the side effects of medication XY?” with respect to the Unified Medical Language System data set, the name medication XY is to be mapped to the resource <http://linkedlifedata.com/resource/umls/id/C0220892>, and side effects are to be mapped to the predicate sider:SideEffect.
Most recent work on question answering over linked data that supports an automatic query language construction of questions is focused on using a template-based triple translation, and/or utilizing the Yago or DBpedia ontology for the triplification process.
Most commonly, the task of predicate detection is tackled by using a template-slot-based approach (Unger et al., 2012). The input representation is matched against a given formal query template representation and subsequently populated with the appropriate slot values. The matching builds upon a rule-based decision tree that uses a fuzzy match algorithm. More precisely, these systems utilize a predefined set of synonym terms that are related to a specific predicate or entire query template.
For example, within the Linked Life Data repository for the question “What drugs cause vomit?”, the relation term ‘cause’ would be mapped onto the predicate
rdf:type sider:SideEffect
so that a query template would be represented as:
SELECT distinct ?d ?nWHERE {?o umls: <SLOT-VALUE>.?d <PREDICATE-VALUE> ?o.?d sider-drugs:drugName ?n.}with the predicate value sider:SideEffect for cause and the slot value umlsconcept: C0042963 for vomit.